test acc
002262941c9edfd472a79298b2ac5e17-Supplemental-Conference.pdf
A.1 Proof Sketch We first introduce the following lemma: Lemma 1. Lemma 2. For matrices A,B 2Mn, if A B, then we have min(A) min(B)and max(A) max(B), where max() (resp., min()) denotes taking the maximum (resp., minimum) eigenvalue.. Proof of Lemma 2. For any matrix P 2Mn with P> = P, we have max(P) = max We first consider the condition number of หH when X is in a locally convex area. By equations 3 and 4, we have M1 H M2. Rearranging the terms yields H M1 0 and M2 H 0. Therefore, for any vector x 2RM, we have We next consider the minimum singular value of H and หH with min(H)= p min(H2) and min(หH)= q min(หH2) in any case. Under Assumption 1 and equation 4, we have H M2. Similarly, we can obtain H M2. By Lemma 2, we further have max(H) max(M2)= nmax 2 C.1 kr หf(หX) k2 vs. krf(X) k2 In this section, we explain why we use kr หf(หX) k2 rather than kr f(X) k2 to characterize the convergence rate. In general, it is hard to develop a convergence rate for objective values. However, when the global model is in a locally convex area of f, we can obtain the relationship between the gradient and the local optimum.
reviewers find our proposed method, claims and empirical methodology to be correct (R1
We would like to thank the reviewers for their comments and positive outlook on the paper. We hope our response clarifies all concerns. We thank the reviewer for the valueable suggestion. As shown, our method outperforms previous work by a large margin using fewer parameters. Therefore, VB-Routing still suffers from the efficiency drawbacks mentioned in Section 1.1.
Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models
Kim, Bum Jun, Kawano, Makoto, Iwasawa, Yusuke, Matsuo, Yutaka
While the robustness of vision models is often measured, their dependence on specific architectural design choices is rarely dissected. We investigate why certain vision architectures are inherently more robust to additive Gaussian noise and convert these empirical insights into simple, actionable design rules. Specifically, we performed extensive evaluations on 1,174 pretrained vision models, empirically identifying four consistent design patterns for improved robustness against Gaussian noise: larger stem kernels, smaller input resolutions, average pooling, and supervised vision transformers (ViTs) rather than CLIP ViTs, which yield up to 506 rank improvements and 21.6\%p accuracy gains. We then develop a theoretical analysis that explains these findings, converting observed correlations into causal mechanisms. First, we prove that low-pass stem kernels attenuate noise with a gain that decreases quadratically with kernel size and that anti-aliased downsampling reduces noise energy roughly in proportion to the square of the downsampling factor. Second, we demonstrate that average pooling is unbiased and suppresses noise in proportion to the pooling window area, whereas max pooling incurs a positive bias that grows slowly with window size and yields a relatively higher mean-squared error and greater worst-case sensitivity. Third, we reveal and explain the vulnerability of CLIP ViTs via a pixel-space Lipschitz bound: The smaller normalization standard deviations used in CLIP preprocessing amplify worst-case sensitivity by up to 1.91 times relative to the Inception-style preprocessing common in supervised ViTs. Our results collectively disentangle robustness into interpretable modules, provide a theory that explains the observed trends, and build practical, plug-and-play guidelines for designing vision models more robust against Gaussian noise.
sufficiently accurate, the solution will eventually become monotonic. In practice, we found that we usually find
We thank all the reviewers. We hope the reviewers could increase the rating if the response addressed your concerns. Theoretically, in Eq. (11), if we take Accordingly, we have revised the relevant sections of the paper by adding pertinent technical details. General Comments: All the learned models that we report performance about are certified monotonic. We will make a note of this in the paper; see also G#1 and G#2.